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IMPORTANCE  SAMPLING  FOR  ESTIMATION 
OF  SMALL  PROBABILITIES 


INTRODUCTION 

One  method  of  describing  the  capability  of  a  signal  processing  system  is  through 
its  false  alarm  and  detection  probabilities  for  detection  applications,  or  in  terms  of 
its  error  probabilities  for  communication  applications.  When  these  probabilities  are 
not  analytically  available,  simulation  can  often  be  employed  to  estimate  them. 
However,  for  very  small  false  alarm  or  error  probabilities,  it  may  not  be  possible, 
via  direct  simulation,  to  conduct  enough  independent  trials  to  realize  reliable 
estimates  with  sufficient  stability. 

This  apparent  shortcoming  is  not  an  inherent  limitation  of  estimation,  but  is  due 
instead  to  the  discrete  counting  procedure  often  adopted  in  direct  simulation.  It  is 
possible  to  remedy  this  situation  by  using  a  “continuous”  counting  procedure, 
whereby  the  result  of  each  individual  trial  can  take  on  a  continuum  of  values,  the 
range  of  which  can  include  arbitrarily  small  probabilities.  In  addition,  the  variance 
of  the  resultant  estimate  can  be  reduced  to  arbitrarily-small  values,  even  for  a 
limited  number  of  independent  trials,  provided  that  the  proper  data-generation 
method  is  used. 

This  technique,  known  as  importance  sampling  (reference  I),  will  be  explained 
and  explored  here  by  means  of  a  particular  signal-processing  example  presented  by 
Hansen  (reference  2).  In  addition,  the  fundamental  variance-reducing  capability 
will  be  investigated  and  used  to  derive  a  better  data-generation  technique. 
Guidelines  for  choosing  good  data-generation  algorithms  will  also  be  presented. 


SIGNAL  DETECTION  EXAMPLE 

The  importance  sampling  technique  will  be  explained  by  means  of  the  following 
signal  detection  example.  Suppose  that  we  observe  N+l  samples  {xn}  of  some 
random  process.  Let  the  probability  density  function  (PDF)  of  the  observation 
vector 

X  =  (xr  x2,  ...  ,  xN+1)  (() 

for  noise-only  be  denoted  by 

p0(x)  =  {Wvt)}  for  a11  Xn  >  0  * 

n*l  l  '  ' ;  (2) 

where  (i  is  unknown;  that  is,  the  power  level  of  all  samples  is  identical  but  is 
unknown.  Also,  let  the  PDF  of  X  for  signal  present  be 
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P‘lX)  =  H  exp(‘^)}  7  “p(‘  “T1) 


for  all  x 


whore  y  is  unknow  n,  but  y  >  /?;  that  is,  the  power  level  of  the  potential-signal  sample 
xN+l  is  larger,  but  is  also  unknown. 

The  generalized  likelihood  ratio  is  derived  in  appendix  A  and  leads  to  the 
threshold  comparison  test 


ir  (xi 


+  *2  + 


Si)  h 


The  false  alarm  probability  is  given  by  the  probability  that  the  left  side  of  (4)  ex¬ 
ceeds  V  when  p(1  in  (2)  is  the  prevalent  PDF  of  X.  This  is  the  example  considered  in 
reference  2,  equations  (4)-(7). 

Analytic  evaluation  of  the  false  alarm  probability  for  test  (4)  and  PDF  (2)  is 
readily  accomplished  in  equations  (A-9)-(A-l  1)  of  appendix  A: 


r  (1  +  V/N) 

The  exact  value  of  p  in  (2)  is  irrelevant  in  test  (4),  since  the  left  side  of  (4)  is  in¬ 
dependent  of  absolute  levels;  hence  P,  A  depends  only  on  the  number  N  of  noise- 
only  samples  and  the  threshold  V.  This  is  called  a  constant  false  alarm  receiver, 
since  the  absolute  noise  level  need  not  be  known  in  order  to  realize  a  specified  false 
alarm  probability.  In  fact,  (5)  can  be  solved  directly  for  the  threshold  required  as 


=  N  ^F 


in  terms  of  the  specified  or  desired  PFA  and  the  number  of  samples  N.  Since  the 
value  of  P  is  irrelevant  in  test  (4),  we  will  set  p  =  1,  henceforth,  without  loss  of 
generality. 


DEFINITION  OF  PROBLEM 

The  general  situation  of  interest  is  depicted  in  figure  1 .  X  is  an  observation  vector 
of  M  components,  with  known  PDF  p(X).  The  processor  takes  this  collection  of  M 
samples,  X,  and  emits  a  single  quantity,  z,  according  to  transformation 

2  *  g(X)  ,  (7) 

which  is  compared  with  threshold  V.  The  known  quantities  here  are  the  input  PDF 
p(X),  the  (nonlinear)  transformation  g(X),  and  the  threshold  V.  There  may  be 
statistical  dependence  between  the  components  {x„}  of  the  observation.  Also,  the 
input  PDF  and  the  transformation  are  arbitrary  but  fixed.  (In  the  example  of  the 
previous  section,  g(X)  is  given  by  the  left  side  of  (4),  and  p(X)  is  given  by  (2).) 


J 
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Figure  1.  General  Processor  of  Observation  X 


We  want  to  evaluate  the  threshold-crossing  probability  (exceedance  probability) 


P  =  Prob  J  z  >  v}  *  Prob  |g(X) 


dX  p(X) 


(8) 


where  Rv  is  defined  as  the  region  of  X  space  where  g(X)  >  V.  If  p(X)  is  the  PDF 
p0(X)  for  noise-alone  at  the  processor  input,  then  P  is  the  false  alarm  probability, 
whereas  if  p(X)  is  the  PDF  p,(X)  for  signal-present,  then  P  is  the  detection 
probability.  We  shall  be  concerned  with  the  former  case  where  the  false  alarm 
probability  is  very  small. 

There  are  at  least  two  major  analytical  difficulties  with  the  problem  statement  in 
(8):  (a)  explicit  determination  of  the  region  Rv  may  be  very  difficult  to  achieve, 
especially  for  large  M;  (b)  evaluation  of  P  via  the  integral  in  (8)  may  be  very  difficult 
to  carry  out,  even  if  Rv  is  explicitly  specified.  For  large  M,  these  analytical  dif¬ 
ficulties  are  virtually  always  insurmountable,  except  for  special  regions  Rs  and 
special  PDFs.  Accordingly,  it  is  frequently  necessary  to  resort  to  a  simulation  to 
estimate  P.  In  this  report,  we  will  consider  the  performance  of:  a  direct  simulation; 
a  modified  simulation  indicated  by  importance  sampling;  and  some  additional 
simulations  indicated  by  the  optimum  PDF  for  importance  sampling. 
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DIRECT  SIMULATION 


Since  the  PDF  of  observation  X  is  known,  we  presume  that  we  can  generate  data 
subject  to  these  statistics.  In  particular,  suppose  we  generate,  according  to  PDF  p, 
the  i-th  observation  vector  X(i>,  statistically  independent  of  X<i>  for  j  ^  i,  for  a  total 
of  T  trials;  i.e.,  1  <  i  <  T.  Now  define  the  unit  step  function 


U(y) 


1  for  y  > 
0  for  y  < 


(9) 


Then  we  define  our  counting  function  on  the  i-th  trial  as 
h 


(10) 


That  is,  the  result  of  the  i-th  trial  is  1  or  0,  depending  on  w  hether  the  threshold  V  is 
exceeded  or  not,  respectively.  Finally,  the  estimate  of  the  desired  probability  P  is 
furnished  by  the  average  of  the  counting  function  over  the  T  independent  trials: 


(ID 


Observe  that  we  use  the  known  quantities  p(X),  g(X),  and  \  each  trial  (10). 


This  estimate  is  unbiased,  because 


Efaj}  =  EjhjfX)}  =  JdX  p(X)  hx(X)  =  /  dX  p(X)  =  P  . 

R 

v 

Here  we  used  the  facts  that  each  observation  X<0  was  generated  according  to  PDF  p, 
that  h,  is  given  by  (10),  and  relation  (8). 

The  PDFs  of  random  variables  h,  and  cr,  are  depicted  in  figure  2.  The  values  for 
the  areas  of  the  impulses  in  the  PDF  for  a,  are  given  by  the  binomial  quantity 

Qk  *  (j)  (1  -  P)T"k  Pk  at  =  f  ,  for  0  <  k  <  T  ,  (13) 

since  all  T  trials  are  independent.  The  mean  value  of  each  of  the  random  variables  is 
also  indicated  in  the  figure,  and  serves  to  point  out  the  fundamental  limitation  of 
such  a  direct  simulation.  Specifically,  the  result  h,  of  a  trial  can  never  equal  the 
desired  quantity  P,  but  can  only  take  on  the  values  0  and  1 .  The  averaging  of  T  trials 
helps  considerably,  but  if  P  is  significantly  less  than  1/T,  the  estimate  yielded  by 
random  variable  a,  is  inadequate  since  it  is  either  too  small  (0)  or  too  large  (1/T, 
2/T,...). 
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Qo 


Figure  2.  Probability  Density  Functions  for  hj  and  a, 


The  result  of  a  simulation  by  means  of  counting  function  h,  in  (10),  for  the  signal 
detection  example  in  (4), 


g(x)  =  i 


Vi 


N  (xl  +  X2  + 


V 


(14) 


with  N  =  32  and  T  =  1000,  is  presented  in  figure  3.  The  exact  result  in  figure  3  is 
that  already  given  by  (5)  and  appendix  A.  The  simulation  via  h,  was  conducted  only 
at  the  integer  values  of  V,  and  is  observed  to  limit  at  1  /N  =  10"J  before  jumping  to 
0.  None  of  the  values  of  P  for  V  >  8  can  be  accurately  estimated  via  this  direct 
simulation. 

The  variance  of  h,  is  P(l-P),  and  that  of  a,is  P(l-P)/T,  since  the  T  trials  are 
independent.  The  ratio  of  the  standard  deviation  of  a,  to  its  mean  is  ((1-P)/(PT))  ', 
which  is  small  only  if  T  is  significantly  larger  than  1/P.  As  a  comparison  case 
against  which  future  estimates  will  be  compared,  we  find  that  for 


we  have  statistics 
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E{h1 }  =  0.000792,  SD^}  =  0.0281, 

EUj}  =  0.000792,  SDtaj}  =  0.000890, 

P  =  0.000792,  Q  =  0.4S3,  Q,  =  0.359,  Q,  =  0.142,  Q3  =  0.038,... 

°  "  ‘  '  (16) 

Here.  SD  denotes  the  standard  deviation.  Thus,  the  standard  deviation  of  estimate 
a,  is  still  greater  than  its  mean  value,  even  though  an  average  of  1000  trials  has  been 
employed.  The  reason  for  this  behavior  is  because  h,  is  such  a  poor  indicator  of  its 
mean  value;  in  fact,  its  standard  deviation  is  35.5  times  greater  than  its  mean  value. 
An  alternative  counting  function  to  h,  that  is  more  closely  peaked  around  its 
average  value  must  be  found. 


IMPORTANCE  SAMPLING 

Suppose  we  generate  observation  X  according  to  alternative  PDF  p*(X),  instead 
of  the  originally  specified  p(X).  Also  let  us  use  counting  function 

h(X)  s  U(g(X)  -  V) 

p*(X)-  (1  ’ 

instead  of  (10).  Observe  that  the  same  known  quantities,  p(X),  g(X),  and  V  are 
involved  in  (17),  in  addition  to  the  yet -to-be-specified  PDF  p*(X).  Also,  h  is  no 
longer  restricted  to  just  the  values  0  or  1,  as  (10)  was,  due  to  the  scaling  p/p*.  The 
transformation  of  interest,  g(X),  and  the  threshold  V  are  not  changed  in  any  way. 

The  estimate  of  P  is  obtained  by  performing  T  independent  trials  as  earlier,  and 
averaging  the  results: 

•  *  T  £  . 

where  the  i-th  observation  X("  is  generated  according  to  alternative  PDF  p*(X),  not 
p(X). 

The  random  variables  h  and  a  are  unbiased  estimators  of  P,  since 
E(h(X) }  *  f  dX  p*(X)  h(X) 

*  /  dX  p(X)  U(g(X)  -  V)  -  /  dX  p(X)  =  P  .  (19) 

R 

v 

Observe  in  the  first  line  of  (19)  that  the  average  of  h  must  be  performed  according  to 
PDF  p*(X),  not  p(X),  since  the  data  X  was  generated  according  to  p*(X);  we  then 
employed  (17),  (10),  and  (12).  The  general  nature  of  the  PDF  of  counting  function  h 
in  (17)  is  displayed  in  figure  4.  There  could  still  be  a  non-zero  probability  of  getting 
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li  =  0,  depend  mu  on  the  ehoiee  of  p*(X)  in  (17);  however,  this  probability  ean  be 
made  mnelt  less  than  for  a  direct  simulation.  Also  there  is  a  distributed  portion  of 
the  PDF,  hopefully  peaked  near  E  J  h }  =  P. 


Since  the  T  trials  leading  to  estimate  a  in  118),  of  the  probability  P,  are 
statistically  independent,  the  variance  of  a  is  given  by 


V{a}  =  i  V{h}  =  i  [E{h2}  -  E2{h}  ] 


(20) 


We  have  already  evaluated  E  {h}  in  (19).  The  remaining  average  required  in  (20)  is 

2 

E{h2}  =  fdX  p* (X)  h2(X)  =  [ dX*— ^-U(gCX)  -  V)  , 

J  p*(X)  (21) 

which  depends  on  p*  as  well  as  p,  g,  and  V;  we  have  again  averaged  h:  according  to 
p*  in  (21),  and  used  (17).  Selection  of  p*  for  a  minimum  of  (21)  will  be  considered 
later. 


SCALING  OF  POTENTIAL-SIGNAL  SAMPLE 

The  first  example  of  importance  sampling  that  we  consider  is  the  one  in  reference 
2,  pp.  548-550.  The  alternative  PDF,  p*,  is  chosen  so  that  inputs  X,  for  which  a 
large  output  z  results  in  figure  1,  are  generated  with  an  increased  probability 
(reference  2,  p.  546).  Specifically,  instead  of  the  original  PDF  (with  f)  =  1 ) 

N+l 

p(X)  =  IT  with  P(x)  *  e'X  for  x  >  0  , 

n=l  n  (22> 

we  use,  for  data  generation,  the  alternative  PDF 
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p*(X) 


N 

*  Ti¬ 
ns! 


{pog> 


*>(¥■) 


with  K  >  1 


(23) 


Thus  the  potential-signal  sample,  \N+1,  has  been  sealed  by  K  and  will  more  often 
lead  to  satisfaction  of  the  threshold  crossing  in  (4).  Use  of  (22),  (23),  and  (14)  in  (17) 
leads  to  eounting  function 


h2(X)  -  K  exp  ^1  -  k))U(xN*!  ~  n  s)  » 


(24) 


where 


(25) 


(If  K  =  1,  (24)  reduces  to  (10),  the  direct  simulation  ease.)  The  corresponding 
estimate  of  P  is  given  according  to  (18)  as 


i  T 

4£ 


t  ~  h2(x(i)) 

1=1 


(26) 


The  result  of  a  simulation  via  h,  and  a2  in  (24)  and  (26)  is  given  in  figure  5  for  the 
comparison  case  cited  in  (15),  with  scaling  factor  K  =  6.  The  contrast  between 
figures  3  and  5  is  very  pronounced.  Now  estimates  of  P  all  the  way  down  to  10  7  are 
possible  via  use  of  h-,,  whereas  previously,  the  direct  simulation  could  not  yield 
estimates  less  than  1/T  =  10'3.  Also,  the  standard  deviation  of  the  estimates  in 
figure  5  is  observed  to  be  very  small  for  the  smaller  values  of  V,  although  it  gets 
larger  as  V  increases.  The  program  for  figure  5  is  given  in  appendix  B;  when  K  is  set 
equal  to  1 ,  the  results  given  in  figure  3  occurred. 

In  order  to  determine  the  performance  of  this  importance  sampling  procedure, 
and  to  ascertain  if  there  is  an  optimum  value  of  sealing  K,  we  evaluate  the  variances 
of  h-,  and  a2.  In  appendix  C,  the  v-th  moment  of  h:  is  evaluated.  In  particular,  there 
follows  from  (C-5), 


Since 

E  {h2  }  =  P  “  /  1y\N 

(28) 

is  independent  of  K  (as  expected),  the  variance  of  h,  is  minimized  when  (27)  is 
minimized.  There  follows  for  the  optimum  value  of  scaling  K,  from  (C-9), 
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A  table  of  the  optimum  scaling  K0  is  given  below,  along  with  the  mean  and 
minimum  standard  deviation  of  estimate  a:  defined  in  (26),  for  N  =  32  and  T  = 
1000.  For  V  =  8,  the  minimum  standard  deviation  of  or,  is  5.9  times  smaller  than  its 
mean  value,  for  example.  This  is  far  better  than  the  situation  in  (16)  for  direct 
simulation,  w  here  the  standard  deviation  of  a,  was  greater  than  its  mean.  For  larger 
V,  i.e.,low  probability  P,  the  minimum  standard  deviation  is  seen  to  become  larger 
than  the  mean  value  P.  Specifically  this  occurs  for  V  ^  18.  Thus  estimation  of  very 
low  probabilities  P  via  this  particular  importance  sampling  procedure  is  subject  to 
significant  error,  even  when  scaling  K  is  optimally  selected.  Of  course,  in  practice, 
the  optimum  value  of  K  will  not  be  known,  and  a  single  value  would  likely  be  used 
for  a  range  of  values  of  V. 


Tablet.  Statistics  of  a1  for  N  =  32,  T  =  1000 


V 

K 

P  =  E{o;} 

MinSD{<*,} 

2 

2.45 

1.44E-1 

7.30E-3 

4 

3.86 

2.3  IE-2 

1.88E-3 

6 

5.04 

4.09E-3 

4.86E-4 

8 

6.03 

7.92E-4 

1.34E-4 

10 

6.87 

1 .66E-4 

4.02E-5 

12 

7.59 

3.75E-5 

1.30E-5 

14 

8.22 

9.05E-6 

4.48E-6 

16 

8.77 

2.32E-6 

1 .65E-6 

18 

9.25 

6.28E-7 

6.43E-7 

20 

9.68 

1.79E-7 

2.64E-7 

Other  important  measures  of  the  quality  of  counting  function  h,  are  furnished  by 
its  PDF  and  exceedance  probability.  These  quantities  are  derived  in  appendix  D.  We 
find 


Prob' 


{h2  >  H}  ■  {'  *  Hi)  N  [' 


and  the  partial  exponential  series  is  (reference  3,  eq.  6.5. 1 1) 


eM(x) 


M 

E 


m=0 


(32) 


A  limiting  procedure  on  (30)  shows  that 
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Prob 


and  therefore  that 


Prob 


(33) 


(34) 


This  is  the  probability  (hat  counting  function  h,  gives  a  zero  output  for  observation 
X,  as  noted  in  the  PDF  in  figure  4. 

The  PDF  of  h,  is  given  in  (D-l  I): 

poo  -  fjpr-xyfr  [>  -  e*p(-  7a)  Vi(?a)]  f°r  0  <  H  i  *  .  I35) 

where  a  is  still  given  by  (31).  A  plot  of  this  PDF  is  presented  in  figure  6  for  N  =  32, 
V  =  8,  and  K  =  6.  Observe  that  the  ordinate  is  a  logarithmic  scale.  The  area  of  the 
impulse  at  H  =  0  is  available  from  (34)  as  .729;  this  is  far  less  than  the  impulse  at 
h,  =  0  in  figure  2(a)  with  area  1-P  =  .999208  (see  (16)).  However,  .729  is  still  a 
substantial  probability  to  be  associated  with  outputting  a  zero  from  the  counting 
function  h2.  The  PDF  in  figure  6  is  very  skewed;  in  addition  to  the  large  impulse  at 
H  =  0,  there  is  an  integrable  singularity  at  H  =  0+.  Although  figure  5  indicates 
significant  improvement  over  figure  3,  the  very  skewed  PDF  in  figure  6  indicates 
that  a  great  deal  more  improvement  should  be  possible  through  proper  choice  of 
alternative  PDF  p*. 

Although  we  could  calculate  the  PDF  of  a,  explicitly  (see  figure  2  and  eq.  13),  this 
is  not  the  case  for  «r2  here,  as  given  by  (26).  We  can  easily  calculate  the  cumulants  of 
a2,  by  means  of  (C-4),  but  calculation  of  the  PDF  would  require  the  following 
numerical  procedure:  (a)  take  the  Fourier  transform  of  PDF  (35),  thereby  obtaining 
the  characteristic  function  of  h2;  (b)  raise  this  complex  function  to  the  N-th  power; 
(c)  take  the  inverse  Fourier  transform,  thereby  obtaining  the  PDF  of  a2.  Some 
relevant  observations  on  this  procedure  are  as  follows:  the  cusp  of  (35)  at  H  =  0  + 
should  be  subtracted  out  and  transformed  analytically;  the  Fourier  transforms 
should  be  accomplished  by  employing  FFTs;  the  cumulative  distribution  of  <r2  could 
be  found  directly  instead  of  its  PDF  (see  references  4  and  5).  We  have  not  pursued 
this  particular  PDF,  but  rather  have  tried  to  improve  on  the  counting  function  h2 
instead. 


OPTIMUM  DATA  GENERATION 


The  fundamental  idea  behind  importance  sampling  was  presented  earlier  in  (17)- 
(21).  It  was  pointed  out  that  minimization  of  the  variance  of  the  estimate  a  in  (18) 
requires  minimization  of  (21)  by  choice  of  the  alternative  PDF  p*.  This  problem  is 
undertaken  in  appendix  E,  with  the  result  that  the  optimum  PDF  to  use  for  data 
generation  is 


P„C» 


p(X)/p  for  Xe(R  fl  R  ) 
0  otherwise 


(36) 
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where  the  regions  in  X-space  are  described  as 

R  :  p(X)  >  0; 

Rv  :  g(X)  >  V  .  (37) 

The  form  (36)  for  (he  optimum  PDF  is  very  illuminating.  Il  says:  generate  X 
values  only  for  w  hich  g(X)  >  V,  and  do  it  with  a  frequency  proportional  to  the  given 
PDF  p(X).  Furthermore,  it  says  not  to  generate  data  X  which  leads  to  zero  values 
for  h,  and  not  to  generate  data  X  which  would  not  have  been  generated  by  the 
original  PDF,  p(X).  Unfortunately,  the  value  of  the  proportionality  constant  in  (36) 
is  P,  the  very  quantity  we  are  trying  to  estimate.  In  addition,  determination  of  the 
region  RDR1  could  be  a  very  difficult  analytical  task. 

The  optimum  counting  function  is  shown  in  appendix  E  to  be  given  by 


hQ(X)  = 

That  is,  every  trial  X  generated  according  to  (36)  yields  exactly  the  same  value  for 
the  counting  function;  the  value  0  in  (38)  is  never  encountered  because  p„(X)  is  zero 
for  such  data  values  X. 

It  follows  that  the  variance  of  h(>  (and  the  corresponding  estimate  a-o  of  P)  is  zero. 
Thus  by  proper  choice  of  alternative  PDF  p*(X),  we  can  reduce  the  variance  of  the 
estimation  error  to  zero,  for  any  fixed  number  of  trials  T.  If  instead  of  choosing  p* 
exactly  equal  to  p0,  we  come  reasonably  close,  then  we  shall  realize  the  variance- 
reducing  capability  inherent  in  importance  sampling  (references  1,  2).  Since  the 
direct  simulation  approach  always  yields  a  zero  output  and  is  far  from  optimum,  a 
significant  improvement  in  estimation  capability  is  often  achieved  with  a  minor 
change  in  the  data-generating  PDF;  witness  the  results  of  the  previous  section  which 
simply  used  a  scaled  version  of  the  potential-signal  sample  and  made  no  use  of  the 
optimum  PDF  for  importance  sampling.  Even  though  direct  usage  of  the  optimum 
PDF  in  (36)  is  not  feasible,  it  does  furnish  some  good  guidelines,  as  noted  under 
(37).  We  shall  use  these  guidelines  in  the  next  section  to  select  some  modified  data- 
generation  PDFs  for  the  processor  g(X)  in  (14)  of  interest  here. 


P  for 


XefR  n  Ry) 


0  otherwise 


(38) 


SOME  ALTERNATIVE  DATA  GENERATION  STRATEGIES 


The  original  PDF  p(X)  is  given  in  (22).  Since  the  PDF  and  the  test  of  interest, 
(14),  involve  {xn}^  only  through  their  sum  s  defined  in  (25),  we  can  rewrite  this  PDF 
as 

exp("Vl}  for  s  »  0  »  ^+1  >  0  » 

(39) 

and  the  test  as 


Vi) 


,W-J  «-s 

(N  -  1)! 


*N*1  ^  N  3 


(40) 
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A  Shifted  PDF 


In  keeping  with  the  guidelines  presented  in  the  previous  section,  we  take  a  shifted 
function  for  the  conditional  PDF: 


P*(s,  VP  =  P*Cs)  *  p*(xN+lls) 


sN-l  e_s 

qTT-iyy  *  exP 


[-  (vi  -  N  *)] 


for  s  >  0.  Vl  >  jjs 


(41) 


This  PDF  is  non-zero  only  in  Rv  D  R,  as  desired;  however,  it  does  not  match  the 
shape  of  (39)  for  all  s,  xN+1,  as  (36)  suggests.  Then  (17)  yields  counting  function 


h3(X)  =  exp(-  J  s)  for  J 


>  sr  s  >  0 


(42) 


Furthermore,  there  is  no  need  to  generate  xN+1  since  it  is  not  involved  in  h3. 
Therefore  we  use  (42)  with  the  PDF  for  p*(s)  as  given  in  (41 ). 

The  exceedance  probability  of  h3  is  immediately  found  from  (42),  (41 ),  and  (32): 
Prob  {h3  >  H}  =  Probjexp^-  ^  >  h|  =  Probfs  <  A3> 


/ 


'  N-l  -s  -A, 

ds  (N  -  1)  !  ;=  1  '  e  e 


N-1  (A3)  for  0  <  H  <  1, 


(43) 


where 


A3  h  -  y  In  H 


(44) 


The  PDF  of  h3  is  available  from  (43)  by  taking  a  derivative  with  respect  to  FI: 

— -1  N-l 

p(H)  "  V(N  ~TT;  hV  (-  7  ln  H)  for  0  <  H  <  1  .  (45) 

The  range  (0,1)  for  h3  is  immediately  obvious  from  (42).  We  observe  there  is  no 
impulse  at  H  =  0  in  the  PDF  (45)  for  h3;  in  fact,  (43)  yields  Prob{h3  >  0}  =  1.  A  plot 
of  (45)  is  given  in  figure  7;  although  not  peaked  at  E{  h3 }  =  P  =  .000792,  it  is  con¬ 
siderably  better  than  figures  2  and  6  for  h,  and  h2,  respectively. 

The  result  of  a  simulation  via  counting  function  (42)  for  N  =  32  and  T  =  1000 
trials  is  given  in  figure  8.  As  done  earlier,  the  simulation  was  conducted  only  at  the 
integer  values  of  V,  and  straight  lines  were  drawn  between  these  estimates.  However, 
if  the  same  random  numbers  constitute  the  set  of  observations  {X(l>},r  for  all  the 
different  threshold  values  V,  as  done  in  figure  8(a),  a  very  misleading  result  and 
conclusion  is  possible;  namely,  it  appears  that  there  is  a  very  small  systematic  error  in 
the  estimate  a,  of  P.  However,  when  different  random  numbers  are  used  for  the 
simulation  at  each  value  of  V,  the  result  in  figure  8(b)  correctly  indicates  an  alter¬ 
nating  but  growing  estimation  error  at  the  lower  probabilities.  Since  in  practice,  the 
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solid  (exact)  curve  in  figures  8(a)  and  8(b)  would  not  be  available,  the  dashed  curve  in 
figure  8(a)  would  give  no  indication  of  how  reliable  the  result  was,  whereas  the 
fluctuating  result  in  8(b)  would  give  a  rough  idea  of  the  reliability  of  the  estimate, 
since  each  plotted  point  is  independent  of  its  neighbor.  The  “in-breeding”  of  the 
same  data  in  figure  8(a)  saves  time  but  can  be  a  dangerous  and  misleading  procedure. 
A  program  for  the  simulation  result  of  figure  8(b)  is  given  in  appendix  F. 


A  measure  of  the  stability  of  the  results  in  figure  8  is  afforded  by  the  variance  of 
To  determine  this  quantity,  we  first  need  v-th  moment 

/°  N-l  -s 

i  .  "  »'  f  .  r 

=  Elexpl 


E^hjfX) 


’(’  l  sv)| =  {  ds  (n  -  TT ;  “>>  (-  S  sv) 

-(— r . 


(46) 


where  we  have  used  (42)  and  (41 ).  Then  the  variance  of  h,  is 


■  (>  *  c  -  (*  *  ir 


(47) 


and  that  for  a3  is  T  times  smaller,  for  T  independent  trials.  A  table  of  the  mean  and 
standard  deviation  of  follows  below.  These  standard  deviations  are  3-4  times 
smaller  than  those  given  in  table  1 ,  which  were  for  the  optimum  scaling. 


Table  2,  Statistics  of  a3  for  N  =  32,  T  =  1000 


V 

P  =  E{<*,} 

SD{a,} 

2 

1.44E-1 

1.56E-3 

4 

2.31E-2 

5.10E-4 

6 

4.09E-3 

1.44E-4 

8 

7.92E-4 

4.11E-5 

10 

1 .66E-4 

I.23E-5 

12 

3.76E-5 

3.91E-6 

14 

9.05E-6 

1.32E-6 

16 

2.32E-6 

4.77E-7 

18 

6.28E-7 

1.82E-7 

20 

1.79E-7 

7.31E-8 

A  Gated  Conditional  PDF 


The  result  in  the  previous  subsection  was  obtained  by  modifying  conditional  PDF 
p(xN+,|s);  here  we  take  the  opposite  tack  by  modifying  p(s|xN+1).  First  define  a  gate 
function 

1  1  for  a  <  s  <  b  ] 


Us(a,  b) 


I  0  otherwise 


(48) 


18 


TR  6449 


Then  define  alternative  PDF 


-  e-xN+l  .  1. 


P*(S’  Vl)  *  P*(Vl)  '  P*(SlXN*l) 

N-l  -s 

;  -  1)  !  us(0,  V  Vi)/Dn(v  Vl)  for  Vl  >0’ 


where  denominator  DN  must  be  determined  so  that  the  conditional  PDF  has  unit 
volume;  that  is,  by  use  of  (48)  and  (32), 

N 

/„  \  rV  Vl  N-l  -s 

Dn(v  Vl)  =  L  dS  (N  -  1)  ! 


\  rV  N+l  N-l  -S 

V  Vl)  =  fQ  ds  (N  -  1)  ! 

exp(-  v  Vx)  Vi(f  Vi)  for  Vi  x  °- 


The  unit  gated  function  Us  in  (49)  keeps  p*  >  0  only  in  the  region  where  xN+1  >-j-s, 
as  was  indicated  desirable  in  the  previous  section.  The  use  of  (17),  (39),  (49),  and  (50) 
leads  to  counting  function 

h4(X)  =  h4  (S  ’  Vl)  =  Dn(v  Vl)  for  Vl  >  0  •  (51) 

Since  random  variable  s  is  not  used  in  (51),  there  is  no  need  to  generate  it;  we  use  (51) 
with  the  PDF  p*(xN+l)  =  exp(-xN+l)  for  xN+l  >0. 

The  exceedance  probability  of  h4  may  be  found  as  follows:  _ 

Prob{h4  >  H)  =  Prob^D^  x^j)  >  =  Prob|xN+1  >  J  DN(H)^ 

00 

=  J  dVl  exp(-  Vl)  =  expf  5  5N(H))  for  0  <  H  <  1  ' 

(52) 

whereby  is  the  inverse  function  to  DN,  i.e., 

DNCDN(y))  =  Y  •  (53) 

The  PDF  of  h4  is  available  through  differentiation  with  respect  to  H: 


pm  «p(-5un(h))  -  J 

v  exp[(1  -  iO  VH)] (N  ‘ 11  ■' 


exp  D„(H)) 

°n'(Vh>) 


for  0  <  H  <  1 


dnOO 


Here  we  used  the  result  of  differentiating  (53)  with  respect  to  y  and  the  derivative  of 
(50),  namely, 
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5n(dn«)  dnO->  -  1  • 

“n'w  ■  frrfrr  for  y  * 0 


(55) 


The  numerical  calculation  of  (52)  and  (54)  can  be  achieved  without  the  need  of 
calculating  the  inverse  function  t^(H).  We  employ  a  parametric  approach  by 
choosing  a  value  for  a  =  L^(H);  then  from  (52)  through  (54),  we  can  compute 


H  =  °n  ’  Prob^h4  >  H}  =  exp  .  p  (H) 


all  in  terms  of  the  parameter  a.  The  function 

DN(a)  =  1  -  exp(-a)  e^U)  (57) 

defined  in  (50)  and  (32)  must,  of  course,  still  be  evaluated. 

The  exceedance  probability  (52)  and  PDF  (54)  are  presented  in  figure  9  for  V  =  8, 
N  =  32.  There  is  a  large  undesirable  cusp  in  the  PDF  at  H  =  0  +  ,  and  a  lesser  one  at 
H  =  1-.  This  choice  of  alternative  PDF  in  (49)  gives  results  reminiscent  of  the  PDF 
for  h,  in  the  direct  simulation,  and  is  not  expected  to  be  very  useful.  A  simulation 
result  in  figure  10  confirms  this.  The  simulation  run  in  figure  10a  employed  the  same 
random  numbers  at  all  V,  for  each  of  the  1000  trials.  Although  a  very  smooth 
estimation  curve  results  in  figure  10a,  it  is  totally  misleading;  for  example,  it  indicates 
probabilities  at  V  =  14  which  are  two  orders  of  magnitude  too  small.  If  the  exact 
answer  were  not  available,  which  is  the  practical  situation,  the  smoothness  of  the 
estimate  might  give  a  false  sense  of  reliability;  in  reality,  the  smoothness  of  the 
estimated  curve  is  no  measure  of  the  accuracy  of  the  result  when  the  data  are  so 
strongly  inbred  by  being  used  repeatedly.  For  contrast,  the  simulation  in  figure  10b 
was  run  with  different  random  numbers  for  all  V,  for  each  of  the  1000  trials.  The 
extremely  large  fluctuations  in  the  estimates  for  the  lower  values  of  probability  are 
indicative  of  the  unreliability  of  this  importance  sampling  procedure. 

The  variance  of  h4  can  be  evaluated  as  follows  from  (5 1 )  and  (50): 


.rXN+l  N-l  -s 

h4  ■  J0  ds  In  t  n T 


where  r  =  N/V.  Then  using  (49),  s'e  obtain  the  mean  value  as 


(58) 


(59) 
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(a)  DISTRIBUTION  OF  h4 


(b)  DENSITY  OF  h4 

Figure  9.  Distribution  and  Density  Functions  for  h4 
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(a)  SAME  RANDOM  NUMBERS  FOR  EACH  V 


(b)  DIFFERENT  RANDOM  NUMBERS  FOR  EACH  V 


Figure  10.  Simulation  for  a  Gated  Conditional  PDF 
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in  agreement  with  (5)  as  expected.  Also,  letting  pj  )  denote  the  PDF  of  s,  and  P  (  )  its 
cumulative  distribution,  we  have 


E{hJ> 


dt  ps(s)  p$(t) 


Ps 


(t)  =  2  f 
0 


dx 


rx 


-Mj*  -  4  sr 


(60) 


where  we  temporarily  let  q  =  1  +  1/r  =  1  +  V/N.  The  variance  of  h4  is  equal  to  (60) 
minus  the  square  of  (59). 


The  mean  and  standard  deviation  of 


a. 


(61) 


are  given  in  table  3  for  N  =  32,  T  =  1000.  Comparison  with  tables  1  and  2  for  a,  and 
aj,  respectively,  reveals  that  the  performance  results  in  table  3  are  much  poorer.  In 
fact,  the  results  for  SD{o4}  are  only  2-3  times  better  than  for  the  direct  simulation 
case  ap  this  is  in  keeping  with  the  observation  made  under  (57)  regarding  the  PDF  of 
h4  in  figure  9. 
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Table  3.  Statistics  of  a4  for  N  =  32,  T  =  1000 


V 

P  =  e{*4} 

SD{*4} 

2 

1.44E-1 

9.78E-3 

4 

2.3IE-2 

3.77E-3 

6 

4.09E-3 

1.42E-3 

8 

7.92E-4 

5.44E-4 

10 

1.66E-4 

2.I5E-4 

12 

3.75E-5 

8.78E-5 

14 

9.05E-6 

3.68E-5 

16 

2.32E-6 

1.58E-5 

18 

6.28E-7 

6.93E-6 

20 

1.79E-7 

3.I1E-6 

A  Combined  Scaled  and  Shifted  PDF 

Since  counting  functions  h,  and  h,  performed  rather  well,  an  attempt  at  combining 
their  features  was  attempted.  Instead  of  the  alternative  conditional  PDF  considered  in 
(41),  we  tried 

p*(Ws>  ■  v  "p  [-  i  (Vi  -  S’ s)]  for  vi  ”  J s-  K  > 1  • 


The  counting  function  is  now  a  generalization  of  (42): 


h_  =  K  exp 


"  xn»i(i  "  k)  ■  w  s 


f0r  XN+1  >  N  S  =•  0  • 


The  v-th  moment  of  h,  is  given  by 


00 

E{h^}  =  f  ds  J  dx  p*(s,  x) 
b  J  o  0  5 

■  f0  ds  (n  -  ?):  S/Ndx  t  "P  [-  ~,~V~~]IC''  “P  [-  vx  -  v 

KV  r°°  SN-1  r  -l 

=  1  ~  v(K  -  1)  J0  ds  WT  1) :  «P  [-  s(l  +  vV/N)J 

KV 

■  - - - T7  (64) 

[l  .  v(K  -  1)]  (l  ♦  4) 

when  the  denominator  terms  are  positive.  For  v  =  1,  this  equals  (5)  as  it  should, 
independently  of  K.  For  K  >  1,  (64)  is  minimized  by  the  choice  of  scaling  K  =  I, 
regardless  of  the  values  of  V,  N,  and  v(>l).  Thus  the  minimum  variance  of  h5  is  at¬ 
tained  by  not  scaling  at  all,  and  just  using  the  shifted  PDF,  as  done  with  hv  Ac¬ 
cordingly  this  alternative  PDF  was  not  studied  any  further. 
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CONCLUSIONS 

The  importance  sampling  procedure  is  an  important  and  useful  tool  for 
estimating  small  probabilities.  Not  only  can  it  estimate  probabilities  considerably 
less  than  1/T,  where  T  is  the  number  of  independent  trials,  but  it  can  do  so  with 
arbitrarily  small  variance. 

However,  the  major  flaw  is  that  the  exact  alternative  PDF  to  use  for  data 
generation  is  not  known.  Some  guidelines  for  choosing  good  PDFs  have  been 
derived.  They  indicate  that  the  new  PDF  should  mimic  the  given  PDF  in  the  region 
where  the  original  PDF  is  positive  and  where  the  test  under  consideration  yields 
threshold  crossings.  In  fact,  one  should  use  a  PDF  which  never  generates  data  that 
lead  to  processor  outputs  less  than  the  threshold  value(s)  under  investigation.  The 
difficulty  of  satisfying  these  goals  makes  selection  of  an  alternative  PDF  more  of  an 
art  than  a  science.  Several  procedures  were  investigated  here,  and  at  least  one  gave 
remarkably  good  estimations  of  probabilities  in  the  10  7  range,  by  means  of  only 
1000  trials.  Some  other  choices  yielded  poorer  results.  It  may  be  necessary  to  try 
several  different  guesses  for  the  alternative  PDF,  and  then  select  the  best. 

The  danger  of  being  deceived  by  a  smooth  estimation  curve,  of  the  exceedance 
probability  versus  threshold,  is  great  if  one  employs  the  same  data  for  all  the 
threshold  values  considered.  Rather,  it  is  recommended  that  different  random 
numbers  be  used  for  each  threshold  considered.  Then  the  width  of  the  independent 
fluctuations  at  different  thresholds  serves  as  a  measure  of  the  reliability  of  the 
results  obtained.  Of  course,  this  additional  feature  is  achieved  at  the  expense  of 
more  computer  processing  time,  since  new  data  must  be  generated  each  time  the 
threshold  is  changed. 

Since  the  region  of  data  space  where  the  threshold  is  exceeded  depends  on  the 
threshold  value  itself,  it  may  be  necessary  to  make  different  choices  of  the  alter¬ 
native  PDF  for  each  threshold  value  of  interest.  This  drawback  is  one  of  the 
compensating  features  that  must  be  accepted  for  the  ability  to  estimate  smaF 
probabilities  with  vanishingly  small  error.  Importance  sampling  is  not  a  panacea. 
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Appendix  A 

GENERALIZED  LIKELIHOOD  RATIO 

The  PDF  for  noise-only  is  given  by  (2).  For  a  given  observation  X,  (2)  is 
maximized  by  the  choice  of  ft  as 

x  N+l 

“  N  +  l  Xn  *  (A-! ) 


The  corresponding  maximum  value  of  (2)  is 


Po00  - 


exp  (-  N  -  1) 


The  PDF  for  signal-plus-noise  is  given  by  (3);  it  is  maximized  by  the  choices 


i  JL 

"  N  5Z  xn 
n=I 


Y1  ‘  Vl 


provided  that  y,  ^  ft,.  If  xN  +  ,  <-^r  2Z  V  then  we  cannot  accept  y,  and  ft,  as  given 

by  (A-3),  because  then  we  would  have  y,  <  ft,,  which  is  inconsistent  with  the 
precondition  stated  with  (3)  that  y  >  ft.  Instead  we  would  set  y  =  ft  and  maximize 
(3),  getting 


Y1  =  B1  a  N  +  1  S,  xn  =  eo  lf  XN+1  < 


1  N 
l  T  x 

Nn=l  n 


Thus  the  maximum  value  of  (3)  is  given  by 

£*P(-  N_  1)  for  x  >  I  f  x  I 

rN  tor  XN+1  -  N  xn  I 

B1  Vl  nal  / 

m  N 

~f  (~'n "l-  1J  for  Vi  ‘  I :  x„  ) 

Bo  n*1  ' 

The  generalized  likelihood  ratio  is  given  by  the  ratio  of  (A-5)  to  (A-2): 

CLn-  Cl  H"  Cl  +  r^1  for  r>l 

GLR  ftN  X.  “  fN  *  nN+1  r  f  -  N 

B1  Vi  (N  *  1J 

and  GLR  =  1  for  r  <  I  /N,  where 


Xx  +  x2  +  ...  ♦  xN 
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Bui  l he  generalized  likelihood  raiio  in  (A-6)  is  a  monotonically  increasing  function 
of  r  for  r  ^  1/N.  Therefore  the  generalized  likelihood  ratio  test  is  equivalent  to 
comparing  r  with  a  threshold;  i. causing  (A-7),  the  detection  statistic  is 


N+l 


n(X1 


+  X 


2*  ‘  * 


*n) 


H< 


(A-8) 


where  threshold  V  >  1 . 

In  order  to  evaluate  the  false  alarm  probability  of  test  (A-8),  we  let 


s  =  Xj  +  + 


+  x„ 


Then  from  (2),  the  PDF  of  s  is 


(A-9) 


s""1  e'S 


P(s)  =  jK  ~  |y|  for  s  >  0 


(A-10) 


where  we  have  let  f)  =  1 ,  since  absolute  scale  is  irrelevant  to  test  (A-8).  Then 


FA 


Pr°b )N+1  >  *N 


w. 

/ 


tmrT  sV/N 


00 

r  / 


dx  e 


(1  +  V/N) 


N  ' 


(A-l  1) 


A-2 
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Appendix  B 

PROGRAM  FOR  SCALING  OF  POTENTIAL-SIGNAL  SAMPLE 


10  N  =  32 
20  T*  1 000 
30  K  =  6 
40  DIM  ft <20> 

50  K 1  =  1  - 1  /K. 

60  Random=SQR< . 6> 

70  RANDOM  1 2E  Random 
80  FOR  1*1  TO  T 
90  X=RND 
100  FOR  J*2  TO  N 
110  X*X*RND 
120  NEXT  J 

130  S=-LOG<X>  !  EQ  25 

140  X*-K*L0G<RND>  !  EQ  23 

150  E«EXP<-X#K1> 

160  Vc  =  I  NT  <  N*X-/S  } 

170  Vc*MIN<Vc , 20} 

180  FOR  V-0  TO  Vc 

190  A<V>*A<V>+E 

200  NEXT  V 

210  NEXT  I 

220  R=K'T 

230  FOR  V-0  TO  20 

240  R< V)*LGT  <A<V>*R> 

250  NEXT  V 

260  PLOTTER  IS  "GRAPHICS" 

270  GRAPHICS 

280  SCALE  0,20, -7,0 

290  GRID  2, 1 

300  PENUP 

310  LINE  TYPE  9 

320  FOR  V*0  TO  20 

330  PLOT  V, A< V)  !  SIMULATION 

340  NEXT  V 

350  PENUP 

360  LINE  TYPE  1 

370  FOR  V*0  TO  20 

380  PLOT  V, -N*LGT< 1+V/N>  !  EXACT 

390  NEXT  V 

400  PENUP 

410  END 
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Appendix  C 

MOMKNTS  OF  h2 

Counting  functio  •  h,  is  given  by  (24),  where  s  is  given  by  (25).  By  reference  to 
(22),  it  can  be  seen  thai  the  PDF  ol's  is 

sN-i  e-s 

P(s)  *  ^  .  f  jr  for  s  >  0  ,  (C-I) 


while  that  for  xN  , ,  is 


p(Vi)  -  exPc‘W  {0T  Vi  >  0 


(C-2) 


Since  only  the  PDF  of  random  variable  xN  +  )  is  changed  in  alternative  PDF  p*  in 
(23),  we  have 


P*(s,  xN+1)  *  (N  _  jyr  K  exP  ("  k  )  for  s  >  0  >  *N+1  *  0 


(C-3) 


Since  h2  in  (24)  is  non-zero  only  if  xN  +  ,  >  sV/N,  then  the  v-th  moment  of  h,  is  given 
by 

EKf  *  / ds  w^ttt  / d*  e>f(K>/lc)  ""  -  r)) 


sV/N 


■  TT  wr-'TV  /  d!  W'  -'IT!'  •■*[-*  j1  ‘  5  (t  * v  -  z) }] 


,V-1 


V  - 


V  -  1 


’  V  (  v  -  1\1N 

1  +  n  v  ~  -ttv 


Forv  =  1,  this  reduces  to  (28). 

The  mean  square  value  of  h2  is  given  by  substituting  v  =  2  in  (C-4): 

*8  * 


(C-4) 


n**(*-tf 


(C-5) 


We  wfant  to  minimize  this  expression  by  choice  of  scaling  K.  To  do  this,  let  t  =  1  /K, 
and  consider  the  reciprocal  of  (C-5): 

R  ■  t(2  -  t) (a  -  bt)N  ,  where  a  =  1  +  j  ,  b  5  J  .  (C.6) 


C-l 
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Setting  dR/dt  to  zero,  we  must  solve  the  equation 


(2  -  t)  (a  -  bt)  -  t(a  -  bt)  -  Nbt(2  -  t)  =  0  .  7) 

If  we  simplify  and  put  t  =  1/K(),  there  follows 

2a  -  2(a  +  b  +  Nb)  Kq  +  (N  +  2)b  =  0  . 

Solving  this  quadratic,  and  substituting  the  values  for  a  and  b  in  (C-6),  we  find  for 
the  optimum  value  of  scaling. 


(C-9) 


The  negative  square  root  is  discarded  because  it  leads  to  values  of  K1(  <  1 ,  which  are 
disallowed. 


C-2 
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Appendix  D 

DISTRIBUTION  AND  DENSITY  OF  h2 

Distribution 

We  repeat  from  (24) 

h2  =  K  exp  J  u  ^xN+1  “  n  • 

Now  h-,  =  H  when  xN  + ,  =  a,  where 

K  exp  (-  a  a  H  ;  a  =  ln(£)  F^T  . 

Also  h,  >  H  when  xN  + ,  lies  in  region  Ra  in  figure  D-l. 


Figure  D-l.  Region  R,  where  h2  >  H 


D- 


TR  6449 


Therefore,  using  (C-3)  and  figure  D-l,  we  have 


aN/V 


N-l  -s 


a 


Prob{h2>H}=  J  ds^—jyr  J  dxiexp(-i) 

sV/N 


0 

aN/V 


■  /  ds  frT  -  ITT  [“!>  (-iks)  -  exP  ('  I)] 

0 

■  (« *  I*  - 


(  "  k)  [x  "  e  2  eN-l^A2^] 


-  exp 

where  a  is  given  in  (D-2), 

,  -  /N  1  \  .  _  N 

A1  ‘  a\V  +  K  )  »  A2  =  3  V  * 

and  (reference  3,  eq.  6.5.1 1) 


(D-3) 


(D-4) 


M 


1  m 


Vx) 

m=u 

is  the  leading  terms,  through  xNI,  of  the  power  series  expansion  of  ev 


(D-5) 


As  H  —0+  ,  a  -*  +  °°  from  (D-2).  Then  A(  -►  +  <»  and  A,  -►  +  °°  from  (D-4),  and 
eN-|(Aj)  ~  AfJ'l/(N-l)!.  However,  the  exponential  expf-Aj)  dominates  this  latter 
behavior,  and  (D-3)  yields 

Prob<h2  >  0}  -  (I  ,  i)  N  .  (D-6) 

We  see,  directly  from  (D-l),  that  h;  can  never  exceed  K.  When  we  substitute 
H  =  K  in  (D-2),  we  get  a  =  0,  and  (D-3MD-5)  then  yield  Prob{h,  >  K}  =  0,  as 
expected. 

Density 

An  alternative  way  of  expressing  (D-3)  is  as  follows;  by  reference  to  figure  D-l , 

a  ,  ^/V  N-l  -s 

Prob{h2  >  H }  *  j  dx  i  exp  (-  £)  /  ds  ^  fy T 


/  dx  i  exp  (-  J)[l  -  exp  (-  £  x)  x)]  , 


(D-7) 


D-2 


r 


where  the  integrand  of  (D-7)  is  independent  of  H.  But  by  definition. 


Prob{h2  >  H }  -  f  dh2  p(h2)  , 

H 


(D-8) 


where  p(h-)  is  the  PDF  of  h,.  Selling  fhe  right-hand  sides  of  (D-7)  and  (D-8)  equal  to 
each  other,  and  differentiating  with  respect  to  H,  we  obtain 


-P(H) 


ur  it exp  (•  !)['  -  “p  (-  v a)  eN.i(? a)] 


(D-9) 


But  from  (D-2), 


3a  K  1_ 

3H  =  "  K  -  1  H 


(D-10) 


Therefore  (he  PDF  of  h,  is  given  by 

p(K)  -  (-  7‘)  eN-l(?a)]  for  0  <  H  iK-(D-U) 

As  H  -»  0  +  ,  the  bracketed  term  in  (D-l  1)  tends  to  I  since  a  -*•  +  Therefore, 

/  _i_  tiV1 

p(H)  -  yCK  -  1)  KK_1  HK-V  as  H  -  0,  . 

This  infinite  cusp  at  the  origin  is  integrable.  For  the  simulation  result  in  figure  5  for 
K  =  6,  this  yields  p( H)  ~  .  1 4/H  8  as  H  -►  0  +  . 
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Appendix  E 

DERIVATION  OF  OPTIMUM  DENSITY  FOR  p* 

Ii  is  convenient  to  define  three  regions  in  X-space;  namely, 

\  :  g(X)  >  V 

R  :  p(X)  >  0 

R*  :  p*(X)  >  0 

Now  we  define  counting  function  (more  precisely  than  (17))  as 


(E-l) 


h(X)  = 


P^8jU(g0°  "  V)  for  XeR* 
0  otherwise 


(E-2) 


Then  the  mean  value  of  h  is  obtained  by  averaging  over  p*: 

E{h(X) }  =  J  dX  p*(X)  h(X)  =  J  dX  p(X)  U(g(X)  -  V)  . 


R* 


R* 


(E-3) 


The  integrand  of  (E-3)  is  non-zero  in  region  RHRN.  In  order  to  keep  h  unbiased,  we 
henceforth  assume  that  R*D(RHR%);  for  then  E{h(X)}  =  P,  according  to  (8). 

According  to  (20)  and  (21),  we  now  want  to  minimize 

2 

E{h2(X) }  =  J  dX  p*(X)  h2(X)  =  f  dX  U(g(X)  -  V) 


R* 


by  choice  of  p*(X).  If  we  let 


R* 


(E-4) 


A(X)  =  p2(X)  U(g(X)  -  V)  for  XeR*  , 


(E-5) 


then  (E-4)  can  be  expressed  as 


E{h2}  -  f 
R* 


dX 


A(X) 


(E-6) 


A(X)  combines  all  the  given  known  quantities  in  one  expression. 


E-l 
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We  have  i he  constraint  that  the  volume  under  p*  must  he  unity  Cor  a  legal  PDF.  If 
we  let  p„(X)  be  the  optimum  value  of  p*(X),  and  perform  a  perturbation  tr;(X)  of 
P(1(X),  using  a  l.agrange  multiplier  for  the  constraint,  the  perturbed  value  of  (E-6) 
becomes 


/dxFT5fT^W- X  /dx[pom  *  en(x)]  ' 

R*  0  R*  (E-7) 

Differentiating  with  respect  to  £,  and  setting  £  =  0,  we  must  obtain  a  zero  quantity 
for  all  variations  qf.X),  in  order  for  pit(X)  to  be  the  optimum.  There  follows  for  the 
optimum  PDF 

PQ(X)  =  c(A(X))1/2  =  c  p(X)  U  (gCX)  -  V)  for  XeR*  , 

(E-8) 


where  c  is  a  positive  constant  and  we  used  (E-5).  The  right-hand  side  of  (E-8)  is  non¬ 
negative,  as  it  must  be  for  a  legal  PDF.  An  alternative  statement  of  (E-8)  is  ob¬ 
viously 


Po« 


c  p(X)  for  Xe(RHRv) 
0  otherwise 


(E-9) 


The  constant  in  (E-9)  is  determined  by  satisfying  the  constraint  of  unit  volume  for 
a  PDF: 


1  =  J  dX  pQ(X)  =  c  f  dX  p(X)  =  c  P  , 
R*  RflR 

v 

using  (8).  Thus  c  =  I /P,  giving  for  the  optimum  PDF 

Ip(X)/P  for  Xe(RHRv) 

0  otherwise 

The  minimum  mean  square  value  of  h  follows  from  (E-4)  as 

min  E{h2(X)}  =  P  j  dX  p(X)  U(g(X)  -  V) 

R* 

=  P  f  dX  p(X)  =  P2  . 

RflR 

v 

There  follows  for  the  variance  of  the  optimum  h,  namely  h0, 

V{h  }  *  E{h2}  -  E2{h  }  =  P2  -  p2  =  0  . 

o  o  o 

E-2 


(E-10) 


(E-ll) 


(E-12) 


(E-13) 
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Use  of  (E-l  1)  in  (E-2)  shows  that  the  optimum  counting  function  is 


hc(x) 


P  for  Xe(RftRv) 
0  otherwise 


(E-14) 


That  is,  every  trial  generated  according  to  optimum  PDF  p0(X)  yields  the  same 
value  for  h„,  namely  P.  The  value  0  is  never  generated  because  p(1(X)  is  zero  for  such 
data  values  X. 
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Appendix  F 

PROGRAM  FOR  A  SHIFTED  PDF 


IS  N  *  3  2 

20  T=100O 

30  DIM  A<20> 

40  Random*SQR< . 6> 

50  RANDOMIZE  Random 

60  FOR  1*1  TO  T 

70  FOR  V=0  TO  20 

80  X=RND 

90  FOR  J *2  TO  N 

100  X=X*RND 

110  NEXT  J 

120  S  =  -L0G < X >  !  EQ  41 

130  A<V>=A<V)+EXP<-V*S/N)  !  EQ  42 

140  NEXT  V 

150  NEXT  I 

160  R  = 1 / T 

170  FOR  V=0  TO  20 

180  A<V>*LGT<A<V)#R> 

190  NEXT  V 

200  PLOTTER  IS  “GRAPHICS*1 

210  GRAPHICS 

220  SCALE  0,20, -7,0 

230  GRID  2,  1 

240  PENUP 

250  LINE  TYPE  9 

260  FOR  V*0  TO  20 

270  PLOT  V, A< V)  !  SIMULATION 

280  NEXT  V 

290  PENUP 

300  LINE  TYPE  1 

310  FOR  V*0  TO  20 

320  PLOT  V, -N*LGT< l+V/N)  !  EXACT 

330  NEXT  V 

340  PENUP 

350  END 
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